An efficient algorithm for the parallel solution of 
high-dimensional differential equations 



Stefan Klus a , Tuhin Sahai b -*, Cong Liu b , Michael Dellnitz 3 

"Institute for Industrial Mathematics, University of Paderborn, 33095 Paderborn, Germany 
b United Technologies Research Center, East Hartford, CT 06108, USA 



X 



t— I Abstract 

The study of high-dimensional differential equations is challenging and difficult due to the analyt- 
ical and computational intractability. Here, we improve the speed of waveform relaxation (WR), 
a method to simulate high-dimensional differential-algebraic equations. This new method termed 
adaptive waveform relaxation (AWR) is tested on a communication network example. Further 
we propose different heuristics for computing graph partitions tailored to adaptive waveform re- 
^vq laxation. We find that AWR coupled with appropriate graph partitioning methods provides a 

speedup by a factor between 3 and 16. 

Keywords: waveform relaxation, adaptive windowing, graph partitioning, Petri nets, parallel 
£^ algorithms 
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^ 1. Introduction 

Over the past few years, several attempts have been made to study differential equations 
of high dimensionality. These equations naturally occur in models for systems as diverse as 
£\j metabolic networks (TJ, communication networks (2), fluid turbulence [3|, heart dynamics 0], 

chemical systems [5| and electrical circuits |[6) to name but a few. Traditional approaches ap- 
proximate the full system by dynamical systems of lower dimension. These model reduction 
techniques |7| include proper orthogonal decomposition (POD) along with Galerkin projections 
Q), Krylov subspace methods [8 1, and balanced truncation or balanced POD (see e.g. 15)). 

In this work, we accelerate a parallel algorithm, for the simulation of differential-algebraic 
equations, called waveform relaxation ll6l fT0l[TT1l . In waveform relaxation, instead of approxi- 
mating the original system by a lower-dimensional model, the methodology is to distribute the 
computations for the entire system on multiple processors. Each processor solves only a part 
of the problem. The solutions corresponding to subsystems on other processors are regarded as 
inputs whose waveforms are given by the solution of the previous iteration. This step is one 
iteration of the procedure. At the end of each iteration the solutions are distributed among the 
processors. The procedure is repeated until convergence is achieved. The initial waveforms are 
typically chosen to be constant. 

This paper is organized as follows: Based on previously derived error bounds for waveform 
relaxation (cf. l[T3"l [l4ln . we propose and demonstrate a new algorithm to break the time interval 
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for simulation [0, T] into smaller subintervals. We call this method adaptive waveform relax- 
ation. It is important to note that this method is different from windowing methods discussed 
in [ 10 1 . Subsequently, we analyze and present time and memory complexity of waveform relax- 
ation techniques and the dependence of the convergence behavior on the decomposition of the 
system. Furthermore, we introduce different graph partitioning heuristics in order to efficiently 
generate an appropriate splitting. We demonstrate that the combination of graph partitioning 
along with adaptive waveform relaxation results in an improved performance over traditional 
waveform relaxation and standard windowing techniques. 



2. Error bounds 

For an ordinary differential equation of the form i = f(x), f : M." t-* W, the iteration method 
described in the introduction can be written as 

x* +1 =<f>(x k+1 ,x k ), (1) 

with <p : W." x W t-* W and (p(x, x) = f(x). The standard Picard-Lindelof iteration, for example, 
is given by <p{x,y) = f(y). Convergence is, by definition, achieved if \\x k+1 - x k \\ < s for a 
predefined threshold s. This procedure can be used to solve differential-algebraic equations as 
well. For a more detailed overview on waveform relaxation we refer to ||6|[10). We assume that 
the splitting is Lipschitz continuous, i.e. there exist constants /j > and rj > such that 

Mx,y) - <f>(x,y)\\ < n\\x - x\\ + rj\\y - y\\. (2) 

Let x be the exact solution of the differential equation and define E k to be the error of the A;-th 
iterate, that is 

E k = x k - x. (3) 

It is well known that the iteration given by Eqn.[T]converges superlinearly (evident in Proposition 
2.1) to the exact solution and that the error is bounded. Convergence results and error bounds 
for waveform relaxation have previously been derived in |[T0l[TT1[T3l[l4l . For the purpose of this 
paper the following version of the convergence result will be useful. 

Proposition 2.1. Assuming that the splitting cf> satisfies the Lipschitz condition, the norm of the 
error \\Ek\\ on the interval [0, T] is bounded as follows 

C k k T k 

\\E k \\ < —L—\\Eo\\, (4) 
k\ 



wi 



thC = e^. 



Remark 2.2. In Eqn. [4] it is important to note that k\ will eventually dominate the numerator 
such that convergence is guaranteed. 



3. Adaptive waveform relaxation 

By Eqn.|4]the error of standard waveform relaxation crucially depends on T, The longer the 
time interval, the greater is the number of iterations needed to bound the error below a desired 
tolerance. This fact is well known and in ifTUl it is suggested to subdivide the time interval 
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[0, T] into windows [0, 7^], [T\, Tj\, . . • , [^v-i, 7VL The authors pick an initial interval of ^ 
and then perform waveform relaxation on the small interval. If the solution has not converged 
in 5 iterations, then the time window is halved. If the size of the interval is too large (based on 
data storage requirements), the window length is reduced. If the current window satisfies the 
above requirements, the same window length is used for the next interval. This approach does 
not take into account the slope of the solution and the error made by the initial waveform. We 
aim to adaptively determine the size of the next time interval based on the previously computed 
solution and on Eqn. [4] 

Let us be more precise. In our procedure, we too first perform waveform relaxation on a small 
interval given by [0, T\]. Define AT; = T, - TVi. Upon convergence of waveform relaxation on 
the interval [TVi, T,], we estimate the length of the next time interval Ar, + i as follows: Firstly, we 
compute an interpolating polynomial of order / using I + 1 equally spaced points tj, j = 0, . . . , /. 
In our implementation, a quadratic polynomial with fo = T/, t\ = 7 1 , - j^ATf, and ti - T/ - -j^ AT/ 
is used. This interpolating polynomial is also utilized as an initial guess for the waveform over 
the next time interval. Using Eqn.|4] we then choose AT/ + i such that 

(e^r 1 AT i+1 ) r 

\\E i+l , r \\ := * : -||£ i+ i,oll < e. (5) 

r! 

In other words, given a desired number of iterations r, one can estimate the length of the next 
time interval if ||.Ej+i,oll> and r\ are known. To estimate the error E !+ i o(£)> we compute the 
difference between jc* +1 (f) and the interpolating polynomial. This can be accomplished using the 
formula 

0<V +1 (g),** +1 (fl) 

£;+i,o(0 = (1+1)! 

where 

O)(t) = (t-to)(t-h)...(t-ti) (7) 

and 0® = is the Z-th derivative of the splitting <p with respect to t (cf. lfT51 ). Additionally, 
we assume that in the above equation exists. We estimate the magnitude of this term using 
finite differences at the end of the time interval just computed. The Lipschitz constants /i and 
rj also need to be estimated in order to get a good guess for the interval length. For nonlinear 
problems, the Lipschitz constants are in general not directly available. Below, we will focus on 
linear ordinary differential equations so that the Lipschitz constants are given by the norms of 
the matrix splitting, as we will show in Section|4] 

With an estimate of all the variables in Eqn. [5] we can now compute the length of the next 
window. Initially, we set ATj+i = 2A!T, and compute Ej + \${Ti + A7Vi). This gives an estimate 
for the magnitude of H^+i.oll for the next time interval. If the resulting error ||£; + i ;r || is larger than 
the threshold e, we repeat the process using an adapted interval length Ar, + i as described in the 
following algorithm. 



Algorithm 3.1. To compute the length AT i+l , execute the following steps: 

J-/ 

20 L 



1. Set AT M = 2ATj and 5 = ^AT t . 



2. Evaluate Ej+^ofT; + Ar, + i) using Eqn.|6]to estimate ||i?; + i,oll and compute ||£j+i >r || with the 
aid of Eqn. [5] 

3. If ll-Ez+i,, !! > b and ATi+i > jAT 1 ;, set A7/ + i = AT^+i - S and repeat step 2. 
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We define the minimal window length to be Ar i+ i = ^T, The above procedure gives a 
sequence of time intervals [0, 7^], [T\, T{\, ■ ■ ■ , [T v -\, T v ], where T v - T, on which waveform 
relaxation is performed with an initial "guess" waveform provided by an extrapolation of the 
solution on the previous interval. 

Intuitively, this procedure works by taking small steps in regions where the solution changes 
rapidly (large derivative) and large steps in regions where the solution changes slowly (small 
derivative). 

4. Partitioning and convergence 

In this section, we analyze the time and memory complexity of waveform relaxation and the 
influence of the splitting on the convergence. It is shown that the optimal splitting depends on 
both the integration scheme and the step size. Since there exists no efficient method to compute 
the optimal splitting directly, we introduce different heuristics in order to generate appropriate 
decompositions. Here, we focus on linear systems of the form 

m = Q*(t), (8) 

with Q e Ml"*", x e R", t e [0, r], and the initial condition x(0) = jco- Linear equations arise in 
models of various dynamical systems. We will consider in particular systems which are derived 
from generalized stochastic Petri nets. In order to solve the initial value problem with the aid of 
waveform relaxation or adaptive waveform relaxation, the system is split according to PQP T = 
M + N and the partitioned system 

i* +1 (f) = Mx k+l (t) + Nx k (t) (9) 

is solved iteratively. Here, P is a permutation matrix and M is a block diagonal matrix. Hence, 
<f>(x k+l ,x k ) = Mx k+l + Nx k . Furthermore, the Lipschitz constants p and r\ are the appropriate 
matrix norms of M and N, respectively. The matrix splitting can be regarded as a graph parti- 
tioning problem where each block of M represents a part or subsystem and N the connections 
between different parts. Let p be the number of blocks where the 2-th block is of size «,, that is 
n = 2f = i Then the 2-th equation can be written as 

x k+ \t) = M, 7 xf +1 (f) + Yj N H^ ( 10 ) 
j*i 

with M H e R"' x \ x t e W, N u e M"' x "', and xj e R"^ for j = 1, ... ,p and + i. 

Let us begin with a remark on the time and memory complexity of waveform relaxation. 
Our aim is to derive conditions under which one expects waveform relaxation (in a parallel 
implementation) to give an answer faster than solving the entire system of equations (in a serial 
implementation). For simplicity, we consider the explicit Euler method with a fixed step size h. 
The same argument can be repeated for other integration schemes with the same result. 

Elementary calculations show that for the full system ([H} the cost of the numerical solution 
on the interval [0, t] amounts to 

C E = (n 2 + n)j. (11) 
h 

We now compute the time complexity of waveform relaxation. The cost of a single Euler step 
for the 2-th subsystem ( fT0) i is Cwr, = «f + «;(» - «;) + Thus, to compute K iterations for all 
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blocks, the total cost would be 



C WR = K(n 2 + n)- = KC E . 



(12) 



Let us assume that there are p processors, and let the Z-th block be the largest, then the time 
complexity in the parallel case is given by 



C W Rp = K(n in + m) 



h 



(13) 



It follows that if n\K < n, then the waveform relaxation procedure is advantageous. Note that K, 
or the number of iterations needed for convergence, strongly depends on the actual decomposi- 
tion. 

The memory complexity in the linear case is easy to classify. In general, one needs to store a 
big matrix of size n 2 . On a single processor, waveform relaxation has the same memory require- 
ments as the full system. For the parallel case, however, the maximum storage needed is n\ x n. 
This can be a major advantage if «/ <K n and the matrix can be stored in the processor cache. It 
is also important to note that the above analysis does not take communication costs into account. 

Remark 4.1. In a nutshell, standard waveform relaxation is of advantage if 

i) ri[K <n for time complexity, 
ii) ni <k n for memory complexity, 

where n\ is the size of the largest block of the decomposed system and K is the number of 
iterations needed for convergence. 

Let us now analyze the influence of the decomposition on the convergence. We discretize the 
system (|9]l using a fixed step size h and an integration scheme of the form 



m+1 ' 



(14) 



where C\, C 2 , and C3 are matrices which may depend on M, N, and h. Let s — jr be the number 
of time steps and X k — xi ... x k s ] the discretized waveform. Furthermore, define 



X k = 



(15) 



Proposition 4.2. For an integration scheme of the form \\A\ the discrete waveform relaxation 
can be written as X k+l = AX k + b, with 



A = 



C 3 

C1C3 + C 2 
C 2 C^ + C1C2 



q-'c 3 + c\- 2 c 2 



c 3 

C1C3 + c 2 



c 3 



C\Ct,+C\C2 C1C3 + C2 C3 



(16) 
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and 



(Ci + C 2 )x Q 
Ci(Ci + C 2 )x Q 
C 2 JC l +C 2 )x 



q- 1 (Ci + c 2 )x 



(17) 



Proof. ByEqn.rni 





Ci 



C, 



c 3 



c 3 


x k + 


(Ci + Ci) x 



c 2 c 3 







V 




d 



u 

and thus X k+l - (I - UY l VX k + (I - U)~ l d, where I is the identity matrix. Using the Neumann 
series and the fact that U is nilpotent, we get 



.v-i 



d-u)- 1 = 



/=() 



/ 

Ci / 
Cj 



c 2 

L 1 



c 



.v-l 



C 2 Ci / 



Hence, A = (I — U) 1 V and b — (I — U) 1 J are of the aforementioned form. 
Example 4.3. The following integration schemes are of the form ( JT4] ): 

i) Explicit Euler method: Ci = (7 + fcM), C 2 = hN, and C 3 = 0. 

ii) Implicit Euler method: Ci = (7 - /jAfT 1 , C 2 = 0, and C 3 = (7 - hMy x hN. 
hi) Trapezoidal rule: Ci = (7 - |M) _1 (7 + |M) and C 2 = C 3 = (7 - |M)- J |AT. 



□ 



To begin with, we discretize the system using the explicit Euler method. Since C 3 = 0, A is 
a strictly lower-triangular block Toeplitz matrix. It follows that the spectral radius p(A) is and 
in particular A s = 0. Therefore, waveform relaxation converges, independent of the partitioning, 
after at most 5+1 iterations, i.e. 



X s = AX S -* +b = A S X" +A s -'b + ---+Ab + b, 



S-&0 . AS-ll 



X s+l = AX S + b= A s b +A s - l b + ---+Ab + b = X s . 



(18) 



If we replace the explicit Euler method by the implicit Euler method, then the spectral radius 
of A is equal to the spectral radius of C 3 = (7 - hM)~ x hN. To accelerate the convergence of 
waveform relaxation, the matrix Q should be decomposed such that the spectral radius of C 3 is 
minimized. Observe that the optimal splitting depends on the step size h. 

If we, on the other hand, use the trapezoidal rule, then the block diagonal of the iteration 
matrix A is given by C 3 = (/ - |M) _1 |iV. That is, the system should be partitioned in a way that 



the spectral radius of the new matrix C3 is minimized. Thus, the optimal splitting depends also 
on the integration scheme. 

Since the iteration matrices A of the implicit Euler or the trapezoidal rule based waveform 
relaxation are highly nonnormal, their spectral properties do not predict the convergence behavior 
appropriately. For such matrices and operators the pseudospectrum is a more useful tool fl6l . 

Definition 4.4. Given a matrix A and s > 0, A € C is defined to be an s-pseudoeigenvalue of A 
if A is an eigenvalue of A + E for a matrix E with \\E\\ < s. 

There are several different equivalent definitions of pseudo-eigenvalues (cf. 17710 . The set 
A £ (A) of all e-pseudoeigenvalues is called the e-pseudospectrum and p e (A) = max{||z|| | z e 
A £ (A)} is called the e-pseudospectral radius. While the e-pseudospectrum of a normal matrix is 
the union of e-balls around the eigenvalues, the pseudospectrum of a nonnormal matrix can be 
sensitive to small perturbations IfTSl . 

In Section [5] the matrix splittings with the best spectral and pseudospectral properties are 
used for comparison. However, there exists no efficient method to minimize the spectral radius 
or the pseudospectral radius directly. We propose different heuristics to find a decomposition 
which is close to the optimal splitting. The partitioning of a directed graph with respect to a 
given cost function is still an open problem, in particular there are no sophisticated spectral clus- 
tering methods for directed graphs (cf. |fT9ll ). Therefore, we combine different graph clustering 
and partitioning methods, namely horizontal-vertical decomposition, spectral clustering, and the 
graph partitioning library PARTY, to generate appropriate splittings. 

Horizontal-vertical decomposition as described in 11201 identifies the subsystem hierarchy 
of dynamical systems. The decomposition is equivalent to the computation of the strongly 
connected components of the graph 0(0, where &(Q) = (23, €) with 03 = {t>i, . . . , t)„} and 
2; = {(o,, Vj) I qij + 0). The strongly connected components can be computed efficiently using 
the depth-first search. 

Spectral Clustering is a popular partitioning heuristic for undirected graphs, based on spectral 
or algebraic graph theory. Spectral clustering utilizes the information obtained from eigenvalues 
and eigenvectors of graph-related matrices such as the graph Laplacian for partitioning. For a 
detailed description we refer to [21 1. Recently, an efficient distributed spectral clustering algo- 
rithm that overcomes the drawbacks associated with random walk based approaches has been 
proposed by one of the authors in |22 1. 

PARTY is a graph partitioning library that provides several different multilevel graph parti- 
tioning strategies combining local and global heuristics for undirected graphs ||23| . The idea of 
the multilevel approach is to coarsen the initial graph by collapsing matching vertices so that 
global partitioning heuristics can be applied efficiently. Subsequently, combined vertices are 
split during the refinement process and local methods like the Kernighan-Lin heuristic or the 
Helpful-Set algorithm are applied to further improve the partition. 

If the matrix Q is reducible, then the system is decomposed first using the horizontal-vertical 
decomposition in order to exploit the directionality of the graph on a coarse level. Then, de- 
pending on the application, either the spectral clustering method or PARTY is applied to the in- 
dividual strongly connected components. Since both methods are confined to undirected graphs, 
the strongly connected components have to be regularized first by omitting the orientation of the 
edges. If it is important to generate a balanced partition of the graph, then PARTY is, in general, 
better suited. If, on the other hand, the network is quite inhomogeneous and the spectral method 
computes an unbalanced splitting while PARTY is forced to generate a balanced splitting, then 
spectral partitioning is advantageous. 
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For large networks with several strongly connected components, the horizontal-vertical de- 
composition is crucial for the quality of the decomposition. If the partitioning methods are di- 
rectly applied to the graph &(Q+Q T ), all information on the directed signal flow and the different 
subsystems is lost. In the next section we will demonstrate the impact of the horizontal-vertical 
decomposition on the convergence of waveform relaxation. 

5. Applications and results 

To illustrate the adaptive waveform relaxation procedure and the spectral and pseudospec- 
tral properties of the iteration matrices, we analyze a linear ordinary differential equation that 
is used for the transient analysis of a continuous-time Markov chain (CTMC). The continuous- 
time Markov chain is derived from a generalized stochastic Petri net (GSPN) l24ll . GSPN is 
a popular model for performance analysis of complex concurrent systems. It has been used to 
model and analyze communication protocols [25], parallel programs [26], multiprocessor archi- 
tectures |22), and manufacturing systems ll28l . The reachability graph of a GSPN with an initial 
marking (state) consists of vertices corresponding to its reachable markings and directed edges 
corresponding to transitions. It has been proved that there exists a one-to-one mapping between 
the reachability graph of a GSPN and the CTMC 1291 . 




Figure 1 : A GSPN model of a client-server system. 

Let m(i) be the probability that the CTMC is in state i at time t. Let r,j, i + j, be the transition 
rate from state i to state j and r„ = - Yij^i r ij- Given the transition rate matrix R = [ry] of 
a CTMC, the state probability distribution at time t denoted by jr(t) = [n\(t),n2{t), . . . ,7T„(t)] 
satisfies 

nit) = n{t)R. (19) 
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We apply the waveform relaxation techniques to solve the above equation. Due to state space 
explosion, the number of differential equations becomes extremely large even for a GSPN of 
moderate size lT30l . This makes them an ideal application to demonstrate the waveform relaxation 
procedure. Figure [T] shows the GSPN that we used for experiment. It models a server shared by 
three clients. The corresponding CTMC of the GSPN has 24 states and the resulting transition 
rate matrix R is sparse. 

For simplicity, we rewrite Eqn. 19 as i(f) = Qx(t). In order to demonstrate the adaptive wave- 



form relaxation procedure and to compare it to standard waveform relaxation, we decompose the 
GSPN into two subsystems of the same size. 

Firstly, we compute the solution of the system using standard waveform relaxation and a fixed 
step size h = 1CT 3 . The initial waveform is assumed to be constant over [0, T], i.e. x°(f) = xo, 
where T = 1. We iterate until the difference between two successive iterations falls below 
the predefined tolerance s = 10 -4 . The solution is shown in Figure^. As one can see, the 
state probability distributions approach constants, i.e. equilibria are eventually reached. Standard 
waveform relaxation takes (averaged over 10 simulations) 0.622 sec. 

The solution is now computed using adaptive waveform relaxation. We use the same toler- 
ance of e = 10~ 4 and an initial window of [0, X], The solution and the intervals computed by 
adaptive waveform relaxation are shown in Figure ^Z)p. Averaging again over 10 simulations, we 
find that adaptive waveform relaxation takes approximately 0.103 sec to compute the solution, 
i.e. over 6 times faster than standard waveform relaxation. 



a) waveform relaxation 



b) adaptive waveform relaxation 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.B 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 



Figure 2: Comparison of solutions obtained from waveform relaxation and adaptive waveform relaxation. 



In the following, we analyze different partitions of the GSPN to illustrate the influence of the 
matrix splitting on the convergence of waveform relaxation. The best balanced bipartition of the 
GSPN for the implicit Euler based waveform relaxation and /? = 10 1 is given by 

9\ = [1 2 5 7 10 11 13 16 18 19 22 24 | 3 4 6 8 9 12 14 15 17 20 21 23] , (20) 

meaning that the first 12 states belong to the first and the remaining 12 states to the second part, 
whereas for h = 10~ 2 the bipartition with the lowest spectral radius is 

f> 2 = [1 2 3 4 5 6 7 10 13 16 19 22 | 8 9 1 1 12 14 15 17 18 20 21 23 24] . (21) 

The splittings Pj and Vi are shown in Figure[3^ and[3}3, respectively. If we use the trapezoidal 
rule, then for h = 10 _1 and h = 10~ 2 the optimal splittings are again given by ( P\ and Pi- 
Nevertheless, for h — 5 • 10~ 2 , for instance, P\ is better suited for the implicit Euler based 
waveform relaxation while P2 is better suited for the trapezoidal rule based waveform relaxation. 
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This example illustrates that the optimal splitting depends on the step size and on the integration 
scheme. To compute these optimal partitions, we compared all balanced decompositions of the 
network. For high-dimensional systems this is clearly not feasible. 

a)h= 10-', p = 0.23632 V)h= l(T 2 ,p = 0.00505 




Figure 3: Optimal splittings P\ andfi- 

From now on, we denote the waveform relaxation operator of the implicit Euler based method 
as A\ and the operator of the trapezoidal rule based method as A^. Although the spectral radius 
of A2 is only half as large as the spectral radius of A; for small step sizes h, both methods require 
approximately the same number of iterations for convergence. Figure [4] shows the dependence of 
the pseudospectral radii on the number of time steps for splitting P\ . If the number of time steps 
is large, then the pseudospectral radii of the iteration matrices are almost equal. The pseudospec- 
tral radii were computed using Higham's Matrix Computation Toolbox PP . Here, the parameter 
s for the computation of the e-pseudoeigenvalues was set to 1CT 3 . 

Below, we compare P\ and P2 to the splittings generated by the heuristics described in 
Section|4] The GSPN is irreducible and the spectral partitioning yields 

Pi = [1 2 345 6 8 9 10 11 12 15 16 17 18 21 | 7 13 14 19 20 22 23 24], (22) 

while PARTY generates a balanced splitting 

P 4 = [1 2 3 4 5 6 9 10 11 15 16 18 | 7 8 12 13 14 17 19 20 21 22 23 24] . (23) 

In Figure [5] the optimal splittings P\ and Pi are compared to the heuristic splittings P$ and 
P4. To illustrate the impact of this approach, two random splittings P5 and P^ are evaluated. 
Figure [5^ shows the spectral and the pseudospectral radii of the waveform relaxation operators 
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10 20 30 40 50 60 



Figure 4: Dependence of the pseudospectral radii on the number of time steps. 



using the implicit Euler method. The number of time steps was set to s — 50. Figure [5J5 shows 
the number of iterations k required for convergence of standard waveform relaxation. Although 
the sizes of the parts of P3 and are different, the results are virtually equivalent. Furthermore, 
the results are close to the results of the optimal splittings V\ and V^- 



a) p{A\) and p s (A\) b) number of iterations k 




Figure 5: Comparison of different splittings, a) Spectral radius (solid line) and pseudospectral radius (dashed line), b) 
Number of iterations required for convergence. 



Now we combine both methods, the graph partitioning heuristics and adaptive waveform 
relaxation, and compare it to standard waveform relaxation. In addition, we subdivide the time 
interval [0, T] into 20, 25, and 30 windows of the same size and use standard waveform relaxation 
for each subinterval. We refer to these methods as FWRi, FWR2, and FWR3, respectively. We 
set again T = 1 and s = 10~ 4 . Adaptive waveform relaxation generates — depending on the 
partitioning — between 24 and 27 windows. The runtime results are shown in Table[T] Note that 
the influence of the splitting on the convergence of adaptive waveform relaxation is much smaller 
than the influence on the standard waveform relaxation procedure. 

For this example, waveform relaxation using a fixed window size performs only slightly 
worse than adaptive waveform relaxation since the state probability distribution quickly con- 
verges to the equilibrium so that the extrapolation of the solution has almost no effect. However, 
the appropriate size of the windows is in general unknown prior to the simulation. Using adaptive 
waveform relaxation, the window sizes are generated and adjusted automatically. 
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Table 1: Runtime results for the GSPN in seconds. 





P\ 


Pi 


Pi 


Pa 


Ps 


n 


AWR 


0.104 


0.103 


0.103 


0.104 


0.105 


0.106 


FWR[ 


0.133 


0.134 


0.133 


0.133 


0.136 


0.137 


FWR 2 


0.121 


0.126 


0.125 


0.125 


0.130 


0.132 


FWR 3 


0.118 


0.119 


0.119 


0.122 


0.126 


0.125 


WR 


0.619 


0.622 


0.621 


0.622 


0.869 


1.071 



To demonstrate the impact of the extrapolation and the adaptive windowing technique, we 
simulate 10 higher-dimensional networks with standard and adaptive waveform relaxation. 
For comparison, we subdivide the time interval into the same number of equally sized windows 
and use again standard waveform relaxation for each subinterval (FWR). The results are shown 
in Table [2] We decompose each system into p = 2 n scc blocks, with n scc being the number of 
strongly connected components. The default partition is defined to be the balanced decomposi- 
tion where the variables are assigned to the blocks without a previous permutation of the matrix. 



Table 2: Runtime results for further examples in seconds. 









HVD+PARTY 


Default partition 




n 




AWR 


FWR 


WR 


AWR 


FWR 


WR 


2i 


100 


1 


0.45 


0.88 


5.89 


0.53 


0.95 


8.14 


Qi 


100 


10 


0.54 


0.92 


6.61 


0.58 


1.01 


10.11 


Qi 


200 


1 


0.98 


2.01 


12.01 


1.04 


3.01 


18.81 


24 


200 


10 


1.05 


2.28 


24.74 


1.18 


2.68 


42.28 


25 


400 


1 


8.77 


20.16 


204.25 


9.42 


21.02 


251.74 


26 


400 


10 


6.93 


14.07 


84.25 


9.16 


19.98 


219.85 


27 


800 


1 


27.13 


69.18 


346.62 


36.21 


74.27 


589.85 


2s 


800 


10 


17.97 


43.41 


326.05 


18.32 


44.87 


604.26 


29 


1600 


1 


78.31 


152.02 


948.33 


96.01 


210.11 


1550.62 


Qio 


1600 


10 


67.80 


172.89 


1434.59 


73.06 


203.06 


2722.87 



If the network consists of several strongly connected components, then the horizontal-vertical 
decomposition is of big importance for the convergence of waveform relaxation. To illustrate 
the influence of the horizontal-vertical decomposition, we simulate a 400 dimensional example 
which consists of 20 strongly connected components. If we apply PARTY directly to decompose 
the system into 40 subsystems, then standard waveform relaxation takes approximately 110.93 
sec and adaptive waveform relaxation 6.77 sec. If we, on the other hand, decompose the system 
first using the horizontal-vertical decomposition and apply PARTY to the individual strongly 
connected components, then the simulation takes only 68.46 sec or 4.41 sec, respectively. 

In summary, the combination of the horizontal-vertical decomposition and the different par- 
titioning methods for undirected graphs enables a reliable and efficient splitting of the system for 
the subsequent standard or adaptive waveform relaxation. 

6. Conclusions 

The performance of waveform relaxation depends on many different influencing factors. One 
important criterion is the proper subdivision of the integration interval into smaller time windows. 
In this work, we proposed an adaptive waveform relaxation method which, depending on the 
previous time interval, generates appropriately sized time windows. In regions where the solution 
changes rapidly, small windows are computed and in regions where the solution changes slowly, 
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large windows are computed. Decomposition of the system is also of great importance for the 
convergence of waveform relaxation. We analyzed the spectra and pseudospectra of the resulting 
waveform relaxation operators and introduced different graph partitioning heuristics in order to 
speed up the simulation. It was shown that it is possible to speed up the computation of high- 
dimensional differential equations using adaptive waveform relaxation along with appropriate 
partitioning heuristics. 
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